FILTER MODE ACTIVE

#reinforcement learning

Records found: 97

#reinforcement learning11/01/2026

Introducing SETA: Open Source RL Environments for Terminal Agents

Explore SETA, a toolkit with 400 RL tasks tailored for terminal agents.

#reinforcement learning25/11/2025

xRouter: RL-Powered Router That Cuts LLM Offloading Costs by Up to 80%

Salesforce AI's xRouter uses reinforcement learning and a cost-aware reward to route queries among 20+ LLMs, approaching top-model accuracy while cutting offloading costs dramatically

READ →

#reinforcement learning18/11/2025

Grok 4.1: xAI Boosts Emotional IQ, Cuts Hallucinations and Climbs to the Top

'Grok 4.1 brings two modes that top LMArena leaderboards, boosts perceived helpfulness and cuts hallucinations for info queries while exposing alignment tradeoffs in deception and sycophancy.'

READ →

#reinforcement learning17/11/2025

DreamGym: Meta's Textual World Model That Cuts Real RL Interactions

'Meta's DreamGym synthesizes environment interactions as text using a reasoning experience model and grounded replay memory, cutting real rollouts and boosting RL performance across web benchmarks.'

READ →

#reinforcement learning05/11/2025

Train a Model-Native Agent to Internalize Planning, Memory and Tool Use with End-to-End RL

'A compact neural agent learns to plan, store and compose symbolic tools end-to-end with reinforcement learning, demonstrating emergent multi-step reasoning on synthetic arithmetic tasks.'

READ →

#reinforcement learning03/11/2025

SkyRL tx v0.1.0 Lets Teams Run Tinker-Compatible RL Locally on GPU Clusters

'SkyRL tx v0.1.0 brings a Tinker-compatible training and inference engine to local GPU clusters, adding end-to-end RL support, faster sampling and Postgres support.'

READ →

#reinforcement learning01/11/2025

DeepAgent: One-Stream AI That Thinks, Finds Tools, and Acts

'DeepAgent merges thinking, tool search, calls, and memory compression into a single reasoning stream, enabling dynamic tool discovery across tens of thousands of APIs and improved long-horizon performance.'

READ →

#reinforcement learning29/10/2025

Agent Lightning: Train Any AI Agent with RL from Real Execution Traces

'Microsoft open sourced Agent Lightning to convert agent execution traces into RL transitions, enabling training of LLM policies with minimal integration and support for standard RL trainers.'

READ →

#reinforcement learning26/10/2025

Train and Compare RL Trading Agents with Stable-Baselines3: A Hands-On Guide

'Learn how to create a custom trading environment and train multiple RL agents with Stable-Baselines3, then evaluate and visualize their performance to find the best strategy.'

READ →

#reinforcement learning23/10/2025

UltraCUA: Hybrid Action Agents that Mix GUI Clicks with Programmatic Tools

'UltraCUA introduces a hybrid action model that lets agents mix GUI primitives with programmatic tool calls, improving success and reducing steps across desktop automation benchmarks.'

READ →

#reinforcement learning19/10/2025

Weak-for-Strong: Training a 7B Meta-Agent to Orchestrate Powerful LLMs

'W4S trains a 7B meta-agent to program Python workflows that call stronger LLM executors, using offline RL to iteratively generate, execute, and refine solutions. The approach yields consistent gains across 11 benchmarks and achieves Pass@1 of 95.4 on HumanEval with GPT-4o-mini.'

READ →

#reinforcement learning16/10/2025

QeRL Unlocks 32B RL Training on One H100 with NVFP4, Faster Rollouts and Better Exploration

'QeRL uses NVFP4 weight quantization plus LoRA and AQN to boost rollout throughput and exploration, allowing a 32B policy to be trained on a single H100 with competitive accuracy.'

READ →

#reinforcement learning09/10/2025

RA3: Temporal Action Abstractions to Speed Up RL Post-Training in Code LLMs

'RA3 formalizes mid-training as pruning plus horizon shortening and uses temporal action abstractions to accelerate RL post-training, boosting code generation benchmarks.'

READ →

#reinforcement learning09/10/2025

AgentFlow: Planner-Only RL and Flow-GRPO for Modular, Tool-Using Agents

'AgentFlow introduces a modular Planner–Executor–Verifier–Generator architecture and Flow-GRPO, a token-level on-policy method that trains only the Planner, reporting substantial gains across ten benchmarks and an open-source MIT implementation.'

READ →

#reinforcement learning16/09/2025

Update Trillion-Parameter LLMs in ~20s with MoonshotAI's Checkpoint-Engine

'MoonshotAI released checkpoint-engine, a middleware that updates model weights across thousands of GPUs in about 20 seconds, enabling fast RL and large-scale LLM serving with minimal downtime.'

READ →

#reinforcement learning08/09/2025

RL's Edge: MIT Shows Reinforcement Learning Cuts Catastrophic Forgetting vs Supervised Fine-Tuning

'MIT shows that on-policy reinforcement learning preserves prior capabilities better than supervised fine-tuning by minimizing forward KL divergence between the base and fine-tuned models.'

READ →

#reinforcement learning05/09/2025

Biomni-R0: Reinforcement-Learned LLMs Reach Expert-Level Biomedical Reasoning

'Biomni-R0 applies end-to-end reinforcement learning and expert rewards to train 8B and 32B biomedical agents that outperform much larger general models on multi-step reasoning tasks.'

READ →

#reinforcement learning31/08/2025

Alibaba Unveils GUI-Owl and Mobile-Agent-v3: AI Agents That Automate Any Interface

'Alibaba Qwen team introduced GUI-Owl and Mobile-Agent-v3, a unified multimodal agent and multi-agent framework that automates GUI tasks across mobile and desktop with state-of-the-art benchmark performance.'

READ →

#reinforcement learning22/08/2025

ComputerRL: Zhipu AI’s Hybrid API-GUI Framework for Autonomous Desktop Agents

'Zhipu AI's ComputerRL combines programmatic APIs with GUI actions and a scalable RL infrastructure to build more capable desktop agents. Experimental results show strong gains on the OSWorld benchmark, driven by the API-GUI paradigm and the Entropulse training method.'

READ →

#reinforcement learning18/08/2025

How Pigeons Laid the Groundwork for Modern AI

'Midcentury pigeon experiments by B.F. Skinner inspired the associative learning ideas that underpin modern reinforcement learning, reshaping both AI and how scientists view animal intelligence.'

READ →

#reinforcement learning14/08/2025

ToolTrain: ByteDance's RL Framework That Teaches LLMs to Explore Code Repos

'ToolTrain teaches LLMs to use simple repository tools and combines SFT with tool-integrated RL to improve multi-hop issue localization, delivering state-of-the-art results on real-world benchmarks.'

READ →

#reinforcement learning13/08/2025

Reinforcement Learning Unlocks Open-Weight LLMs for Long-Horizon Software Engineering

Nebius AI and Humanoid adapted DAPO-based reinforcement learning to train an open-weight Qwen2.5 agent for long-horizon software engineering, reaching 39% Pass@1 on SWE-bench Verified without teacher supervision.

READ →

#reinforcement learning12/08/2025

ProRLv2: NVIDIA Extends Reinforcement Learning to Unlock Deeper LLM Reasoning

ProRLv2 scales RL training to 3,000 steps and combines regularization and exploration techniques to expand reasoning capabilities in compact LLMs, showing strong benchmark gains across math, coding, logic and STEM tasks.

READ →

#reinforcement learning10/08/2025

Algorithms That Collude: How Self‑Learning Pricing Tools Are Rewriting Antitrust Law

'AI pricing tools can produce tacit collusion like outcomes, challenging traditional antitrust frameworks and prompting new enforcement, legislation, and transparency measures.'

READ →

#reinforcement learning09/08/2025

Graph-R1: Agentic Hypergraph RAG for Multi-Turn Reinforced Reasoning

'Graph-R1 combines hypergraph knowledge, agentic multi-turn retrieval, and end-to-end RL to deliver state-of-the-art QA accuracy and efficient generation.'

READ →

#reinforcement learning04/08/2025

ByteDance Launches Seed-Prover: A Breakthrough in Automated Mathematical Theorem Proving

ByteDance introduces Seed-Prover, a novel lemma-centric system that achieves breakthrough results in automated mathematical theorem proving, solving 5 out of 6 IMO 2025 problems and excelling across multiple benchmarks.

READ →

#reinforcement learning30/07/2025

MiroMind-M1 Sets New Standards in Open-Source Mathematical Reasoning with Innovative Multi-Stage Reinforcement Learning

MiroMind-M1 introduces an open-source pipeline for advanced mathematical reasoning, leveraging a novel multi-stage reinforcement learning approach to achieve state-of-the-art performance and transparency.

READ →

#reinforcement learning30/07/2025

Rubrics as Rewards: Enhancing Language Model Training with Structured Multi-Criteria Feedback

'Rubrics as Rewards (RaR) introduces a reinforcement learning approach that uses structured rubrics as reward signals, improving language model training in complex domains like medicine and science.'

READ →

#reinforcement learning25/07/2025

Alibaba Launches Qwen3-MT: Revolutionary Multilingual Translation Powered by Reinforcement Learning

Alibaba introduces Qwen3-MT, a next-generation multilingual machine translation model featuring cutting-edge architecture and reinforcement learning for high-quality, cost-efficient translations across 92+ languages.

READ →

#reinforcement learning20/07/2025

Master-RM: Strengthening Trust in LLM-Based Reward Models Against Superficial Exploits

Master-RM is a new reward model designed to fix vulnerabilities in LLM-based evaluators by reducing false positives caused by superficial cues, ensuring more reliable reinforcement learning outcomes.

READ →

#reinforcement learning19/07/2025

MemAgent: Revolutionizing Long-Context Handling in LLMs with Reinforcement Learning

MemAgent introduces a reinforcement learning-based memory agent that allows large language models to process ultra-long documents efficiently, maintaining high accuracy with linear computational costs.

READ →

#reinforcement learning18/07/2025

GLM-4.1V-Thinking: Breaking New Ground in Multimodal Reasoning and Understanding

GLM-4.1V-Thinking is a cutting-edge vision-language model that pushes the boundaries of multimodal reasoning, setting new standards across various challenging AI tasks.

READ →

#reinforcement learning18/07/2025

Mirage: Enabling Visual Reasoning in Vision-Language Models Without Image Generation

Mirage introduces a new method for vision-language models to integrate visual reasoning without generating images, significantly enhancing their ability to solve spatial and multimodal tasks.

READ →

#reinforcement learning16/07/2025

Apple Unveils DiffuCoder: A 7B-Parameter Diffusion Model Revolutionizing Code Generation

Apple and the University of Hong Kong introduce DiffuCoder, a 7-billion parameter diffusion model designed specifically for code generation, demonstrating promising results and novel training methods.

READ →

#reinforcement learning14/07/2025

MMSearch-R1: Reinforcement Learning Revolutionizes Real-Time Multimodal Search in LMMs

MMSearch-R1 introduces a reinforcement learning framework that enables large multimodal models to perform efficient, on-demand searches by learning when and how to retrieve relevant information, significantly improving accuracy and reducing search overhead.

READ →

#reinforcement learning11/07/2025

How World Models Empower Embodied AI to Perceive and Act Like Humans

Embodied AI agents leverage world models to perceive and act in real or virtual environments, enhancing their autonomy and human-like interaction across various industries.

READ →

#reinforcement learning09/07/2025

Salesforce's GTA1 Sets New Benchmark in GUI Agents, Surpassing OpenAI's CUA

Salesforce AI releases GTA1, a powerful GUI agent that outperforms OpenAI's CUA by leveraging innovative test-time scaling and reinforcement learning techniques to improve task success and action grounding.

READ →

#reinforcement learning06/07/2025

Meta and NYU's Semi-Online Reinforcement Learning Enhances LLM Alignment Efficiency

Meta and NYU developed a semi-online reinforcement learning method that balances offline and online training to enhance large language model alignment, boosting performance in both instruction-based and mathematical tasks.

READ →

#reinforcement learning06/07/2025

AbstRaL: Boosting LLM Robustness with Abstract Reasoning and Reinforcement Learning

AbstRaL uses reinforcement learning to teach LLMs abstract reasoning, significantly improving their robustness and accuracy on varied GSM8K math problems compared to traditional methods.

READ →

#reinforcement learning04/07/2025

ASTRO Boosts Llama 3 Reasoning by Over 16% Using Post-Training Techniques

ASTRO, a novel post-training method, significantly enhances Llama 3's reasoning abilities by teaching search-guided chain-of-thought and self-correction, achieving up to 20% benchmark gains.

READ →

#reinforcement learning03/07/2025

Together AI Launches DeepSWE: Open-Source RL-Trained Coding Agent Achieving Top SWEBench Scores

Together AI has launched DeepSWE, an open-source, reinforcement learning-trained coding agent based on Qwen3-32B, achieving top scores on the SWEBench benchmark and setting new standards for autonomous software engineering AI.

READ →

#reinforcement learning03/07/2025

ReasonFlux-PRM: Revolutionizing Chain-of-Thought Evaluation in Large Language Models

'ReasonFlux-PRM is a new trajectory-aware reward model that evaluates both reasoning steps and final answers in large language models, significantly improving their reasoning capabilities and training outcomes.'

READ →

#reinforcement learning01/07/2025

OMEGA Benchmark: Testing the Creative Limits of AI in Math Reasoning

OMEGA is a novel benchmark designed to probe the reasoning limits of large language models in mathematics, focusing on exploratory, compositional, and transformational generalization.

READ →

#reinforcement learning01/07/2025

LongWriter-Zero: Reinforcement Learning Revolutionizes Ultra-Long Text Generation Without Synthetic Data

'LongWriter-Zero introduces a novel reinforcement learning framework that enables ultra-long text generation without synthetic data, achieving state-of-the-art results on multiple benchmarks.'

READ →

#reinforcement learning28/06/2025

Tencent Releases Hunyuan-A13B: Efficient 13B Parameter MoE Model with Dual-Mode Reasoning and Massive 256K Context Support

Tencent introduces Hunyuan-A13B, a highly efficient open-source MoE language model with dual-mode reasoning and support for ultra-long 256K context lengths, achieving state-of-the-art benchmark results.

READ →

#reinforcement learning27/06/2025

Unbabel Launches TOWER+: The Breakthrough Multilingual LLM for Accurate Translation and Instruction Following

Unbabel introduces TOWER+, a unified multilingual large language model that excels in both high-fidelity translation and instruction-following, surpassing existing open-weight models in benchmarks.

READ →

#reinforcement learning27/06/2025

Polaris-4B and Polaris-7B: Scalable Reinforcement Learning Unlocks Advanced Math and Logic Reasoning

Polaris-4B and Polaris-7B introduce a novel reinforcement learning recipe that scales reasoning capabilities efficiently, achieving state-of-the-art results on math benchmarks with smaller models.

READ →

#reinforcement learning27/06/2025

GURU: Advancing LLM Reasoning Across Six Diverse Domains with Reinforcement Learning

GURU introduces a multi-domain reinforcement learning dataset and models that significantly improve reasoning abilities of large language models across six diverse domains, outperforming previous open models.

READ →

#reinforcement learning26/06/2025

MEM1: Revolutionizing Memory Efficiency for Long-Horizon Language Agents

MIT and NUS researchers introduce MEM1, a reinforcement learning framework that enables language agents to efficiently manage memory during complex multi-turn tasks, outperforming larger models in speed and resource use.

READ →

#reinforcement learning24/06/2025

ByteDance Unveils ProtoReasoning: Boosting Large Language Model Generalization with Logic-Based Prototypes

ByteDance researchers introduce ProtoReasoning, a new framework leveraging logic-based prototypes to significantly improve reasoning and planning abilities in large language models across various domains.

READ →

#reinforcement learning20/06/2025

PoE-World: Modular Symbolic Models Surpass RL Baselines in Montezuma’s Revenge with Minimal Data

PoE-World introduces a modular symbolic approach that surpasses traditional reinforcement learning methods in Montezuma’s Revenge with minimal data, enabling efficient planning and strong generalization.

READ →

#reinforcement learning19/06/2025

MiniMax AI Launches MiniMax-M1: A 456B Parameter Hybrid AI Model for Extended Context and Reinforcement Learning

MiniMax AI has unveiled MiniMax-M1, a 456B parameter hybrid model optimized for long-context processing and reinforcement learning, offering significant improvements in scalability and efficiency.

READ →

#reinforcement learning19/06/2025

DeepSeek Stands Out as the Most Willing AI Chatbot to Engage in Explicit Conversations

'New research reveals DeepSeek as the most flexible AI chatbot willing to engage in explicit sexual conversations, contrasting with stricter models like Claude and GPT-4o.'

READ →

#reinforcement learning19/06/2025

ReVisual-R1: Breaking New Ground in Multimodal Reasoning with a 7B Parameter Open-Source Model

ReVisual-R1 is an innovative open-source 7B multimodal language model that advances complex reasoning by integrating a three-stage training pipeline with novel reinforcement learning techniques.

READ →

#reinforcement learning15/06/2025

DeepCoder-14B: The Open-Source AI Revolutionizing Code Generation

DeepCoder-14B is an open-source AI model designed for efficient and transparent code generation, matching proprietary models in performance while promoting collaboration and accessibility.

READ →

#reinforcement learning14/06/2025

Internal Coherence Maximization: Revolutionizing Unsupervised Training for Large Language Models

Internal Coherence Maximization (ICM) introduces a novel label-free, unsupervised training framework for large language models, achieving performance on par with human-supervised methods and enabling advanced capabilities without human feedback.

READ →

#reinforcement learning12/06/2025

Why Large Language Models Miss Instructions and How to Fix It

Large Language Models often skip parts of complex instructions due to attention limits and token constraints. This article explores causes and practical tips to improve instruction adherence.

READ →

#reinforcement learning12/06/2025

CURE: Revolutionizing Code and Unit Test Generation with Self-Supervised Reinforcement Learning in LLMs

CURE is a novel self-supervised reinforcement learning framework that enables large language models to co-evolve code and unit test generation, significantly enhancing performance and efficiency without requiring ground-truth code.

READ →

#reinforcement learning10/06/2025

Meta Unveils LlamaRL: A Breakthrough Scalable RL Framework for Large Language Models

Meta has introduced LlamaRL, an innovative scalable and asynchronous reinforcement learning framework built in PyTorch that dramatically speeds up training of large language models while optimizing resource use.

READ →

#reinforcement learning05/06/2025

NVIDIA's ProRL Unlocks Advanced Reasoning in AI Through Extended Reinforcement Learning

NVIDIA introduces ProRL, a novel reinforcement learning method that extends training duration to unlock new reasoning capabilities in AI models, achieving superior performance across multiple reasoning benchmarks.

READ →

#reinforcement learning03/06/2025

Shanghai AI Lab Unveils Entropy-Based Scaling Laws to Tackle Exploration Collapse in Reinforcement Learning for LLMs

Shanghai AI Lab researchers propose entropy-based scaling laws and novel techniques to overcome exploration collapse in reinforcement learning for reasoning-centric large language models, achieving significant performance improvements.

READ →

#reinforcement learning02/06/2025

MiMo-VL-7B: Advancing Visual and Multimodal AI with State-of-the-Art Vision-Language Model

MiMo-VL-7B is a powerful vision-language model developed by Xiaomi researchers, offering state-of-the-art performance in visual understanding and multimodal reasoning through advanced training techniques.

READ →

#reinforcement learning02/06/2025

Revolutionizing LLM Reasoning with Off-Policy RL and KL Divergence Regularization

Researchers introduce Regularized Policy Gradient (RPG), a novel framework leveraging KL divergence in off-policy reinforcement learning to significantly improve reasoning and training stability in large language models.

READ →

#reinforcement learning01/06/2025

Enigmata Toolkit Revolutionizes Puzzle Reasoning in Large Language Models with Advanced Reinforcement Learning

Enigmata introduces a comprehensive toolkit and training strategies that significantly improve large language models' abilities in puzzle reasoning using reinforcement learning with verifiable rewards.

READ →

#reinforcement learning30/05/2025

Apple and Duke Introduce Interleaved Reasoning to Boost LLM Speed and Accuracy

Apple and Duke researchers introduce Interleaved Reasoning, a reinforcement learning method that allows LLMs to produce intermediate answers, significantly boosting response speed and accuracy in complex tasks.

READ →

#reinforcement learning28/05/2025

Surprising Math Reasoning Gains from Incorrect and Random Rewards in Qwen2.5-Math

Qwen2.5-Math models improve math reasoning significantly even when trained with incorrect or random reward signals, highlighting unique reinforcement learning dynamics not seen in other models.

READ →

#reinforcement learning28/05/2025

MMaDA: A Breakthrough Unified Multimodal Diffusion Model for Text and Image Tasks

MMaDA is a novel unified multimodal diffusion model that excels in textual reasoning, visual understanding, and image generation, outperforming existing systems across multiple benchmarks.

READ →

#reinforcement learning27/05/2025

Phi-4-Reasoning Proves Bigger Isn't Always Better in AI Reasoning

Microsoft's Phi-4-reasoning demonstrates that high-quality, curated data can enable smaller AI models to perform advanced reasoning tasks as effectively as much larger models, challenging the notion that bigger models are always better.

READ →

#reinforcement learning27/05/2025

QwenLong-L1: Advancing Long-Context Reasoning in Large Language Models with Reinforcement Learning

QwenLong-L1 introduces a structured reinforcement learning approach enabling large language models to excel at long-context reasoning tasks, achieving state-of-the-art results on multiple benchmarks.

READ →

#reinforcement learning25/05/2025

NVIDIA Unveils Llama Nemotron Nano 4B: A Compact, High-Performance AI Model for Edge and Scientific Applications

NVIDIA introduces Llama Nemotron Nano 4B, a compact open-source AI model optimized for edge deployment that outperforms larger models in scientific reasoning and programming tasks.

READ →

#reinforcement learning25/05/2025

GRIT Empowers Multimodal LLMs to Reason Visually and Textually with Minimal Data

GRIT introduces a groundbreaking method for teaching multimodal large language models to jointly reason with images and text, significantly improving visual grounding and reasoning accuracy while requiring minimal training data.

READ →

#reinforcement learning24/05/2025

Reinforcement Learning Empowers LLMs to Outperform Traditional Compilers in Assembly Code Optimization

Researchers have developed a reinforcement learning framework that enables LLMs to optimize assembly code beyond traditional compilers, achieving a 1.47× speedup and 96% correctness on thousands of real-world programs.

READ →

#reinforcement learning23/05/2025

Thinkless: A Smart Framework That Cuts Language Model Reasoning by 90% Using DeGRPO

Researchers from the National University of Singapore developed Thinkless, a framework that dynamically adjusts reasoning depth in language models, cutting unnecessary computation by up to 90% while maintaining accuracy.

READ →

#reinforcement learning22/05/2025

Enhancing Large Language Models with Structured Reasoning Beyond Spontaneous Insights

Researchers improve large language models' reasoning by explicitly aligning core abilities like deduction, induction, and abduction, surpassing traditional instruction-tuned models in accuracy and reliability.

READ →

#reinforcement learning22/05/2025

RXTX: Machine Learning Powers a Faster Algorithm for Structured Matrix Multiplication

RXTX is a novel machine learning-based algorithm that achieves faster and more efficient structured matrix multiplication, outperforming existing methods including recursive Strassen techniques.

READ →

#reinforcement learning21/05/2025

NVIDIA Unveils Cosmos-Reason1: Revolutionizing Physical Common Sense and Embodied AI Reasoning

NVIDIA introduces Cosmos-Reason1, a new suite of AI models designed to enhance physical common sense and embodied reasoning using multimodal learning and innovative ontologies, improving AI interaction in real-world environments.

READ →

#reinforcement learning20/05/2025

Anthropic Study Uncovers Flaws in Chain-of-Thought Explanations of AI Reasoning

Anthropic’s research exposes critical gaps in how AI models explain their reasoning via chain-of-thought prompts, showing frequent omissions of key influences behind decisions.

READ →

#reinforcement learning16/05/2025

DanceGRPO: Revolutionizing Visual Generation with Unified Reinforcement Learning Across Modalities

DanceGRPO introduces a unified reinforcement learning framework that enhances visual generation across multiple paradigms and tasks, significantly improving visual quality and alignment with human preferences.

READ →

#reinforcement learning15/05/2025

NVIDIA's Joey Conway Reveals Breakthroughs in Open-Source AI with Llama Nemotron Ultra and Parakeet

NVIDIA's Joey Conway discusses groundbreaking open-source AI models Llama Nemotron Ultra and Parakeet, highlighting innovations in reasoning control, data curation, and rapid speech recognition.

READ →

#reinforcement learning14/05/2025

Harnessing Toxic Data in LLM Pretraining to Boost Detoxification and Control

New research shows that including toxic data in LLM pretraining improves the model's ability to be detoxified and controlled, leading to safer and more robust language models.

READ →

#reinforcement learning13/05/2025

Nemotron-Tool-N1 Revolutionizes LLM Tool Usage with Reinforcement Learning and Minimal Supervision

Nemotron-Tool-N1 introduces a novel reinforcement learning approach enabling large language models to effectively use external tools with minimal supervision, outperforming existing fine-tuned models on key benchmarks.

READ →

#reinforcement learning13/05/2025

RLV: Enhancing Language Model Reasoning with Integrated Value-Free Verification

RLV introduces a unified framework that integrates verification into value-free reinforcement learning for language models, significantly improving reasoning accuracy and computational efficiency on mathematical reasoning benchmarks.

READ →

#reinforcement learning10/05/2025

Alibaba's ZeroSearch Revolutionizes LLM Retrieval with Reinforcement Learning and Simulated Documents

'Alibaba’s ZeroSearch framework leverages reinforcement learning and simulated document generation to train language models for retrieval without relying on costly real-time search APIs, achieving performance comparable to or better than Google Search.'

READ →

#reinforcement learning10/05/2025

Microsoft Unveils ARTIST: A Reinforcement Learning Framework Empowering LLMs with Dynamic Tool Use and Agentic Reasoning

'Microsoft Research has developed ARTIST, a reinforcement learning framework that empowers LLMs to use external tools dynamically, significantly improving performance on complex reasoning tasks.'

READ →

#reinforcement learning10/05/2025

Salesforce’s xGen-small Revolutionizes Enterprise AI with Efficient Long-Context Processing

Salesforce’s xGen-small offers a compact AI model delivering efficient long-context understanding with reduced costs and strong privacy, transforming enterprise AI workflows.

READ →

#reinforcement learning09/05/2025

DeepSeek-Prover-V2: Revolutionizing the Bridge Between Intuition and Formal Math Proofs

DeepSeek-Prover-V2 bridges informal intuition and formal math proofs, achieving strong benchmark results and offering open-source access to revolutionize AI-driven mathematical reasoning.

READ →

#reinforcement learning09/05/2025

OpenAI Unveils Reinforcement Fine-Tuning on o4-mini for Advanced Custom AI Models

OpenAI launches Reinforcement Fine-Tuning on the o4-mini model, enabling developers to customize AI reasoning with precision using reinforcement learning techniques.

READ →

#reinforcement learning07/05/2025

WebThinker: Empowering Large Reasoning Models for Autonomous Web Search and Scientific Reporting

WebThinker is a new AI agent that empowers large reasoning models to autonomously search the web and generate detailed scientific reports, significantly improving performance on complex reasoning benchmarks.

READ →

#reinforcement learning05/05/2025

NVIDIA and CMU Unveil Nemotron-CrossThink: Advancing Multi-Domain Reasoning in Large Language Models

NVIDIA, CMU, and Boston University researchers introduce Nemotron-CrossThink, a novel framework that expands reinforcement learning for large language models beyond math to multiple reasoning domains with improved accuracy and efficiency.

READ →

#reinforcement learning03/05/2025

UC Berkeley and UCSF Unveil Adaptive Parallel Reasoning to Boost LLM Efficiency Within Context Limits

Researchers at UC Berkeley and UCSF have developed Adaptive Parallel Reasoning, a novel method that allows large language models to dynamically distribute inference tasks across parallel threads, enhancing reasoning performance without exceeding context window limits.

READ →

#reinforcement learning02/05/2025

StarPO-S and RAGEN: Breakthroughs in Stable Multi-Turn LLM Agent Training

Researchers introduce StarPO-S and RAGEN frameworks, significantly improving stability and reasoning capabilities in training autonomous large language model agents for multi-turn interactive tasks.

READ →

#reinforcement learning02/05/2025

Xiaomi's MiMo-7B: Compact AI Model Excelling in Math and Code Reasoning Beyond Larger Rivals

Xiaomi's MiMo-7B is a compact language model that surpasses larger models in math and code reasoning through advanced pre-training and reinforcement learning strategies.

READ →

#reinforcement learning01/05/2025

DeepSeek-Prover-V2: Advancing Formal Theorem Proving with AI and Reinforcement Learning

DeepSeek-AI released DeepSeek-Prover-V2, an open-source large language model designed for formal theorem proving using subgoal decomposition and reinforcement learning, achieving state-of-the-art results on multiple formal reasoning benchmarks.

READ →

#reinforcement learning01/05/2025

Microsoft Unveils Phi-4-Reasoning: A Powerful 14B Parameter Open-Weight Model for Complex Reasoning

Microsoft launched the Phi-4-Reasoning family, a set of 14B parameter open-weight models optimized for complex reasoning tasks. These models demonstrate competitive performance on math, planning, and coding challenges with transparent training and open access.

READ →

#reinforcement learning30/04/2025

OpenPipe’s ART·E Revolutionizes Email Agents with Reinforcement Learning: Faster, Cheaper, More Accurate

OpenPipe’s ART·E uses reinforcement learning to deliver faster, cheaper, and more accurate email question-answering, outperforming OpenAI’s o3 agent in key metrics.

READ →

#reinforcement learning28/04/2025

Tina: USC's Tiny Models Deliver Big Advances in Cost-Effective Reinforcement Learning

USC researchers introduce Tina, a family of compact reasoning models that leverage LoRA and reinforcement learning to deliver strong multi-step reasoning performance at a fraction of typical training costs.

READ →

#reinforcement learning25/04/2025

Skywork AI Unveils R1V2: A Breakthrough in Multimodal Reasoning with Hybrid Reinforcement Learning

Skywork AI introduces R1V2, a cutting-edge multimodal reasoning model that blends hybrid reinforcement learning techniques to improve specialized reasoning and generalization, outperforming many open-source and proprietary models.

READ →